Hip-hop music has been a major cultural force for several decades, with its roots in the African American and Latino communities of the Bronx in the 1970s. Over time, the genre has evolved drastically. In this course I’ve been doing research on the difference and similarities between popular ‘old school hip-hop’ from the 90s and popular modern day rap from around 2017 to 2023. Rap is now considered as one of the most popular genres in contemporary music. I’m interested in how the music has changed and if modern day rap has significant differences compared to 90s hip-hop that can be confirmed by using computational musicology; but I’m also interested in possible similarities. Therefore my corpus will contain a merged group of tracks from both these periods. In the presented analysis of this research I will refer to the popular hip-hop music of the 90s as ‘old-school’ and the popular rap music from around 2017 to 2023 as ‘modern’.
As I said, the natural groups and comparison points are popular tracks from both these time spans. One of my hypotheses is that I expect to see an increase in aspects like loudness because of the way music and hip hop beats get produced these days; but I am very unsure how other factors like energy, valence and danceability have changed between these two groups. Besides that I also assume that modern rap will feature more electronic and digital production elements than 90s hip-hop. I hope to determine these differences by looking at various aspects, such as timbre.
One strength of my corpus is that I respectively checked, with the help of popularity charts and the stream counts on Spotify, which tracks were the most popular in these time spans in both the 90s and around 2017 to 2023 and used these as a reference point. Hereby I ruled out a certain form of subjectivity and my own taste of tracks because I only focused on tracks that are popular and have a lot of streams. So despite that these two time spans differ a lot and the genre of hip hop/rap has drastically evolved; my comparison groups have one thing in common, they both consist of only really popular tracks. This also provides some interesting research questions: On the one hand, can we determine a notion of unity in both time spans that only consists of popular tracks, or is there a lot of fluctuation? And on the other hand, can we identify common characteristics for the whole corpus?
The SpotifyAPI track features provide a lot of information about how the tracks are classified and what different musical characteristics and qualities they convey. These track features therefore offer a reliable foundation to start off this research. You can find the playlist of the corpus here.
This is an AI generated image, made on Craiyon, of two very well known artists: on the left we see Post Malone and on the right Jay-Z. A fun blend of two iconic figures from different eras, with Jay-Z representing the 90s hiphop sound and Post Malone symbolizing the modern day rap style.
To begin the visualisation of the selected corpus let’s take a look at two variables called energy and valence. How do they differ from the 90s to the modern day rap music and is there a correlation between the two variables? The measurement for both variables ranges from 0.0 to 1.0. Valence is referring to the emotional quality the music conveys: 1 being very positive, happy and/or uplifting and 0.0 being angry, regretful or sad. Energy gives us an idea about the intensity and the activity of the track. A really energetic track would feel rousing for example. We can see that there is more fluctuation in the modern rap music when we look at valence, the old-school graph has more tracks with higher valence compared to the later period, consisting of predominantly values above 0.5. Overall we see that the most popular hip-hop/rap songs from both time spans generally have high energy values, most of them being around 0.5 or higher. It is difficult to determine a clear correlation between energy and valence in these popular hip-hop/rap songs from both periods: because when we look at the old-school graph the highest values of energy are both located at the lowest and highest values of valence. One possible explanation for this could be that either low or high valence results in high energy. It’s important to mention that there are exceptions to this though, such as ‘Check the Rhime’ from A Tribe Called Quest released in 1991 (the blue dot in the upper left corner of the “old-school” graph). There is one significant difference we can conclude between the old-school and modern graph: the modern period’s highest energy values do not correlate with either high or low valence values, where the old-school period has the highest energy on the extreme values of valence, both high and low. You can hoover over the dots of the scatter plot to see the track names, the exact release date and the exact values for both variables; if you are familiar with 90s hip-hop or popular rap songs from recent years, you will probably recognize some of the track names!
These graphs examine the relationship between speechiness and danceability from both time periods (you can highlight one of the two periods by double-clicking on it in the legend). Danceability measures how suitable a track is for dancing, it’s primarly based on the rhythm, tempo and beat strength. It ranges from 0.0 to 1.0, with higher values indicating a greater likelihood of the track being danceable. Speechiness, on the other hand, measures the presence of spoken words in a track. It also ranges from 0.0 to 1.0, with higher values indicating that the track contains more ‘spoken word-like’ vocals. Without going in too much detail here, we are looking at two graphs which contain exactly the same data; the only differences are the trend lines, these are calculated by different methods called LOESS and GAM. One interesting and very extreme outlier here is ‘Yes Indeed’ from Lil Baby featuring Drake; having both the highest danceability (almost at a maximum of 1.0) and the highest speechiness of the whole corpus. Because of this I’ve used two different methods for determining the trend lines. For the old-school period, both methods showed a relatively stable relationship between speechiness and danceability, with a upward trend at the beginning, peaking in between the speechiness values of 0.2 and 0.3. These findings suggest a significant trend in the data during the old-school period, where an increase in speechiness is associated with an initial increase in danceability, followed by a subsequent decrease. The cluster of data points (between the 0.2-0.3 range of speechiness) in the old-school period also nicely align with the peaks of both lines. The modern period also aligns and crosses with the old-school trend line around the spot of 0.23 speechiness, using the LOESS method. However, looking at this trend line of the modern period, we can see that the outlier ‘Yes Indeed’ has a significant impact on the observed trend regarding the LOESS method. The GAM method, on the other hand, shows a horizontal line; which would indicate that the increase of speechiness doesn’t influence the danceability of the modern period in a significant way, whilst the old-school period tends to show a significant non-linear relationship using both methods. All in all we can conclude that there would be more analysis needed to confirm these assumptions but that the results are remarkable: ‘Yes Indeed’ has a very decisive impact on the modern period, but it’s also an important point of data which we can’t ignore. Another relevant general observation regarding both groups, is that the popular rap music significantly consists of more tracks which have less speechiness compared to the 90s popular hip-hop and that these tracks tend to show more fluctuation regarding the danceability. The graphs also confirms one observation made earlier and this actually is very important to emphasize and remember that what we are looking at is relative: the dataset exhibits a significant positive correlation between danceability and track popularity, with only a few tracks (three out of the whole corpus) having a score below 0.5 (one even being 0.497, which is extremely close to this threshold).
Timbre is everything about an audio file that is separated from the sound qualities pitch, duration and volume. It can be seen as a tone color or tone quality. For example, think about a violin and a piano that would both play a musical tone like C4: this tone could have the same pitch, same duration and the same volume but will differ in timbre. Timbre is a comprehensive concept that is sometimes difficult to put into words and can therefore be difficult to analyse. This graph contains the 12 timbre coefficients that Spotify uses to analyse an audio file and this is pretty ambiguous to interprete; the only people who have assigned a real meaning to them are Spotify engineers, and they’ve kept information about the exact meaning of these values internal. When we take a quick look, there seems to be a lot of similarity between the two, but on the other hand, when we look more detailed, there are significant differences in shape and range in certain coefficients, for example at c02, c05, c07, c08 and c11, this confirms that the timbre of both groups differ. The first coefficient actually is mainly based on the loudness variable; so besides the fact volume usually isn’t a factor in timbre, Spotify uses loudness as one of their timbre coefficients. Try to take a good look at c01 before going to the next page, or feel free to come back to this page after visiting the next, although small, this graph has visualized the specific values of the loudness variable of this corpus very accurately and it is fun to see how you can recognize the shape of the boxplots I will present on the next page into these shapes!
One of my hypotheses was that modern rap music would have higher loudness than 90s hip-hop music. This boxplot shows that the modern rap music has a higher median, but that the old-school period actually has the highest max loudness value of the whole corpus! The range of loudness for old-school (from -14.73 to -2.43) is way wider than that of modern rap music (from -9.31 to -3.37, when we leave out the outlier), this indicates that there is greater variability in loudness within 90s hip hop music; the interquartile range from both boxplots are a good visual representation for this. Overall, the boxplots suggest that modern rap music tends to be louder than 90s hip hop music on average; because it has a more concentrated distribution of loudness values and has a higher median whilst the IQR range is way smaller. Nevertheless, it is noteworthy to mention that the third quartile of the ‘old-school boxplot’ has a higher range than the modern one and, like I mentioned earlier, the loudest track of this corpus is from the 90s. However, the possibility of these tracks being remastered or other factors that may have affected the loudness of the music cannot be ruled out based on the boxplots alone and this could be a really decisive factor on why the ‘old-school boxplot’ has the highest max loudness value. This corpus only consists of the most popular tracks and therefore it makes sense to assume these tracks have been remastered before they where uploaded on Spotify.
With this graph we are looking at the mean tempo of the all the tracks. The y-axis of the graph displays the standard deviation, which is a measure of the degree of variability of the data points from the mean value. In addition to this, the graph also includes information on two other variables: duration and volume. One first observation when looking at this graph is that the standard deviation overall is very low; we can conclude that there is not a lot of variance in the calculated means of the tempi. Something we would expect with hip-hop and rap music in general, and therefore also from both these periods, as the beats and rhythm usually are very repetitive and steady. It is a logical conclusion that this also explains why this type of music generally has such high danceability values, as we saw earlier. Besides this we can see a clear difference between the hip-hop from the 90’s and the modern rap music. As the old-school period (apart from the 3 outliers on the right) has significantly lower mean tempo than the modern rap music. One interesting observation, something we saw earlier in the valence and energy graph too, is that the modern rap music shows a lot of fluctuation compared to the old-school period. We can see a steady cluster of data around 140/150 BPM but also a lot of other different values of BPM, whilst the data from the old-school period has a really steady cluster of almost all the data (besides the 3 outliers with high BPM) from around 80 to roughly 115 BPM. This fluctuation can also be observed in the duration variable. With the modern tracks displaying a wider range of sizes, including very small ones (which means these songs are very); this stands in contrast to the old-school tracks, which tend to have a more uniform size. This observed variability in track duration, with a trend towards shorter tracks in the modern period, may be attributed to a shift in the music industry towards optimizing streaming revenue, as shorter tracks tend to have a higher repeat value which generates more streams and therefore higher profits.
Here we are looking at a self-similarity matrix of the song ‘Shoota’ from Playboi Carti featuring Lil Uzi Vert, which was one of the centre points in the first graph when we are looking at valence and energy. This song in my opinion accurately represents a shift in the sound and style of hiphop, reflecting the evolution of the genre and the changing tastes and preferences of its audience. The delivery style and flow of the rapping are very different, it has a more melodic and sing-song approach in comparison with a more straight forward style of old school hip-hop. ‘Shoota’ also features a somewhat more complex and layered production style than the old school hip hop songs and therefore I think it is an interesting song to look at and give a in depth analysis using multiple grams. We can see that the self-similarity matrices transparently visualise the segmentation of the song. A short intro, followed by a ‘big block’, which is Lil Uzi Vert’s verse; with an absence of drums. After this the drums come in and the chorus is rapped by Playboi Carti (this is the ‘second block’). We can hear some extra notes added, the main harmony is shifted up an octave and the drums ‘drop’. This results in both a new timbre section and a new chroma section. After this chorus, there is a verse of Playboi Carti (the ‘third block’) and at the end the chorus is repeated (the ‘fourth block’), which is shown very clear by both grams. I will refer to the specific sections of this song in the following pages, so please do not hesitate to revisit this page if you require a refresher on the song’s structure.
Do you remember the Spotify timbre coefficients we looked at earlier? Well, this is precisely the same, but now we are looking at the coefficients of a specific song. There are some interesting observations that can be made based on this cepstrogram. Upon close inspection, we can actually see the chorus of Playboi Carti very clearly! The shift in timbre is convenient as it leads to a shift of various timbre coefficients (such as c01, c04 and c06) in the middle and end of the song; which represents the chorus as we saw in the self-similarity matrices. When the first chorus ends (some seconds before 100), we can observe a noticeable rise in c02, which can be attributed to the sudden absence of the beat’s 808 (bass): this leads to an increase of brightness (which is roughly what c02 represents). Besides this we can clearly see that during Lil Uzi Vert’s part, with the absence of drums, c03 and c05 have a relatively high magnitude. Another interesting observation is the short spike of magnitude of c11 in the song’s intro whilst at the same time c03 drops from high magnitude to extremely little for some seconds, which may be attributed to Lil Uzi Vert’s sudden appearance and increased vocal prominence in combination with the beat.
We can see that this song is in the key C# minor, as it is dominantly present throughout the whole song. At the first chorus, where the drums come in and Playboi Carti begins rapping, the algorithm is having some problems; I think this is due the sudden presence of the drums, which leads to some inharmonicity. Therefore we see blending of various keys, but still, C# minor relatively contains the most energy at this part and we can see one straight line throughout the whole song at this key. In the verse of Playboi Carti, we can see that B major has higher energy than C# minor. This is not very unusual as B major (B, C♯, D♯, E, F♯, G♯, A♯) is very alike of C# minor (C♯, D♯, E, F♯, G♯, A, B) as they only differ one note (being A and A#). After this, a repetition of the key blending hierarchy emerges, providing further evidence of the song’s structural coherence and the tonic being C# minor without modulation. One fun fact I’ve heard while listening to the song which I verified by playing the melody on a piano: the track actually starts with a C# minor chord being played as a descending arpeggio, moving from G# to E and resolving into C#!
This visual representation effectively communicates its meaning without the need for extensive explanation: the yellow line represents the tempo of the song. As we saw earlier, regarding the graph displaying the average tempi of all tracks within the corpus, most songs consist of extremely low standard deviation and therefore have a very stable BPM. ‘Shoota’ is a good example of this; as we can see the BPM of the whole song is roughly 150 BPM and it almost shows no variation except from some little jumps which are due to the little differences in the beat (the 808 which fades out or the hi hat that stops for a section).
This computational research of popular 90s hip-hop and modern rap music has revealed several interesting insights into the musical trends regarding the differences and similarities between the two periods. Through the use of a variety of computational musicology techniques, we were able to identify some key features of each period’s music and explore the ways in which they differed and overlapped. Some of which were very likely, such as the BPM having a low standard deviation; and others which were very surprising, such as the loudness variable in which the old-school period had a higher max value. The first few graphs showed us some interesting trends; like the recurrence of more fluctuation in the data of the modern rap compared to the 90s hip-hop. One possible explanation for that could be the development of the rap genre through the years and the way the music industry has expanded; but there would be other types of research needed to make such conclusions. Another thing that would be interesting for further research is the relationship of danceability and speechiness in modern rap, as we saw one remarkable outlier that contradicted some observations. The modern rap showed a decrease in speechiness and high danceability values overall, but at the same time the highest danceability of the whole corpus had the highest speechiness. One very important point of discussion is that I’ve only used popular tracks, which were a good way of making objective comparison groups but maybe also limited it’s potential in a way. For example, future research could explore specific sub-genres, which also were getting pointed out by the cluster; where both periods were clustered together in the higher hierarchies. This way instead of comparing the different time spans; the difference between sub-genres could be a topic of research, where the music from different time spans that are alike (which would need more computational analysis) could provide insightful information about the development and emergence of certain sub-genres. One thing I was very interested in was how do the time spans differ? One of the main findings was that timbre is a significant feature in distinguishing between the two time periods. Additionally, the cluster successfully aligned with the time periods after removing the pitch sound qualities. There are a lot of possible explanations of why the timbre differs: the use of autotune and other pitch correction techniques (one assumption I’ve made is that the significant decrease of speechiness we saw in the most popular modern music is due the increase of autotune and sing-song elements in rap), the integration of more synthesized sounds and electronic production, and therefore a shift towards using fewer samples in modern rap production. In conclusion, future research could expand on this study by examining sub-genres within the hip-hop and rap genres, as well as extending the time period analyzed to cover the entire history of rap music and using a bigger dataset